Segment selection considering local degradation of naturalness in concatenative speech synthesis

نویسندگان

Tomoki Toda

Hisashi Kawai

Minoru Tsuzaki

Kiyohiro Shikano

چکیده

In this paper, we investigate the effect of using a novel cost, RMS (Root Mean Square) cost, for segment selection for concatenative Text-to-Speech. The RMS cost is affected not only by the total degradation of naturalness but also by the local degradation of naturalness. From the results of experiments comparing this approach with segment selection based on a conventional average cost, it is found that (1) in the segment selection based on the RMS cost a larger number of concatenations causing slight local degradation are performed in order to avoid concatenations causing greater local degradation and (2) the effect of the RMS cost has little dependence on the size of the corpus. Moreover, we clarify that the naturalness of synthetic speech can be slightly improved by utilizing the RMS cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis

ABSTRACT In segment selection for concatenative Text-to-Speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic spee...

متن کامل

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis

In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maxi...

متن کامل

Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations

This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not...

متن کامل

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Prosody-based unit selection for Japanese speech synthesis

A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be di cult to realize natural prosody. T...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Segment selection considering local degradation of naturalness in concatenative speech synthesis

نویسندگان

چکیده

منابع مشابه

Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis

An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis

Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Prosody-based unit selection for Japanese speech synthesis

عنوان ژورنال:

اشتراک گذاری